AITopics | greedy decoding

Collaborating Authors

greedy decoding

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Misra, Diganta, Islah, Nizar, May, Victor, Rauby, Brice, Wang, Zihan, Gehring, Justine, Orvieto, Antonio, Chaudhary, Muawiz, Muller, Eilif B., Rish, Irina, Kahou, Samira Ebrahimi, Caccia, Massimo

arXiv.org Artificial IntelligenceJul-23-2025

The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. We make the dataset and evaluation code publicly available at https://github.com/mrcabbage972/GitChameleonBenchmark.

large language model, machine learning, programming language, (22 more...)

arXiv.org Artificial Intelligence

2507.12367

Country:

North America > Canada > Quebec (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars

Prabhu, Sumanth

arXiv.org Artificial IntelligenceAug-19-2024

Self-ensembling techniques with diverse reasoning paths such as Self-Consistency have demonstrated remarkable performance gains in text generation with Large Language Models (LLMs). However, such techniques depend on the availability of an accurate answer extraction process to aggregate across multiple outputs. Moreover, they acquire higher inference cost, in comparison to Greedy Decoding, due to generation of relatively higher number of output tokens. Research has shown that the free form text outputs from Self-Consistency can be aggregated reliably using LLMs to produce the final output. Additionally, recent advancements in LLM inference have demonstrated that usage of diverse exemplars in prompts have the ability to induce diversity in the LLM outputs. Such proven techniques can be easily extended to self-ensembling based approaches to achieve enhanced results in text generation. In this paper, we introduce PEDAL (Prompts based on Exemplar Diversity Aggregated using LLMs), a hybrid self-ensembling approach, that combines the strengths of diverse exemplar based prompts and LLM based aggregation to achieve improvement in overall performance. On the publicly available SVAMP and ARC datasets, our experiments reveal that PEDAL can achieve better accuracy than Greedy Decoding based strategies with lower inference cost compared to Self Consistency based approaches.

dataset, greedy decoding, language model, (15 more...)

arXiv.org Artificial Intelligence

2408.08869

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU

Galvez, Daniel, Bataev, Vladimir, Xu, Hainan, Kaldewey, Tim

arXiv.org Artificial IntelligenceJun-6-2024

The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on decoding. Current state-of-the-art RNN-T decoding implementations leave the GPU idle ~80% of the time. Leveraging a new CUDA 12.4 feature, CUDA graph conditional nodes, we present an exact GPU-based implementation of greedy decoding for RNN-T models that eliminates this idle time. Our optimizations speed up a 1.1 billion parameter RNN-T model end-to-end by a factor of 2.5x. This technique can applied to the "label looping" alternative greedy decoding algorithm as well, achieving 1.7x and 1.4x end-to-end speedups when applied to 1.1 billion parameter RNN-T and Token and Duration Transducer models respectively. This work enables a 1.1 billion parameter RNN-T model to run only 16% slower than a similarly sized CTC model, contradicting the common belief that RNN-T models are not suitable for high throughput inference. The implementation is available in NVIDIA NeMo.

algorithm, cuda graph, implementation, (15 more...)

arXiv.org Artificial Intelligence

2406.03791

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (0.88)
Information Technology > Graphics (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Faster Text Generation with TensorFlow and XLA

#artificialintelligenceAug-29-2022, 19:11:02 GMT

TL;DR: Text Generation on transformers using TensorFlow can now be compiled with XLA. It is up to 100x faster than before, and even faster than PyTorch -- check the colab below! As the quality of large language models increased, so did our expectations of what those models could do. Especially since the release of OpenAI's GPT-2, models with text generation capabilities have been in the spotlight. And for legitimate reasons -- these models can be used to summarize, translate, and they even have demonstrated zero-shot learning capabilities on some language tasks.

argument, text generation, xla, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback